Protein folding on rugged energy landscapes: Conformational diffusion on fractal 

networks 
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We employ simulations of model proteins to study folding on rugged energy landscapes. We 
construct "first-passage" networks as the system transitions from unfolded to native states. The 
nodes and bonds in these networks correspond to basins and transitions between them in the energy 
landscape. We find power-laws between the folding time and number of nodes and bonds. We 
show that these scalings are determined by the fractal properties of first-passage networks. Reliable 
folding is possible in systems with rugged energy landscapes because first passage networks have 
small fractal dimension. 



Understanding how proteins reliably fold to their na- 
tive conformations despite frustration in the form of non- 
native interactions between residues is an important, 
open question. Advances in experimental techniques, 
such as single-molecule fluorescence [3| and fast thermal 
quenching methods 2] , have enabled a quantitative char- 
acterization of the dynamics that occur during folding of 
single proteins. For example, we now know that a large 
number of metastable conformations are sampled during 
the folding and unfolding processes, as observed in fold- 
ing stability Q and mechanical denaturation f^] studies. 

How does a protein fold reliably to its native confor- 
mation even though a large number of metastable states 
exist? For over twenty years the answer to this ques- 
tion has been the principle of minimal frustration [^. 
Within this framework, one recognizes that metastable 
states are present, but assumes that the barriers separat- 
ing local energy minima are sufficiently low that there is 
still a large thermodynamic force driving folding to the 
native state ((jj. This idea is illustrated by the funneled 
energy landscape in Fig.[T](a), where the roughness scale 
5E is much smaller than depth of the energy minimum 
A£; that drives folding {5E < I^E). While the fun- 
neled energy landscape may explain how some proteins 
fold reliably a different picture, i.e. rugged energy 
landscapes may describe folding in metastable Q and 
intrinsically disordered Q proteins, as well as misfold- 
ing flo'|. Rugged energy landscapes, as shown in Fig. [T] 
(b) , possess a roughness scale that is comparable to that 
of the smooth funnel 5E ~ /S.E. In this limit, the thermo- 
dynamic drive to fold is absent on biological timescales, 
and protein conformational dynamics proceed via acti- 
vation over energy barriers with only local knowledge of 
the landscape. 

What physical observables differentiate proteins with 
funneled versus rugged landscapes? This is a difflcult 
question to answer since, although funneled energy land- 
scapes have been studied extensively, virtually no re- 
search has focused on reliable folding in proteins with 
rugged energy landscapes. We make a crucial first step 
in answering this question by studying the properties of 
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FIG. 1; Schematics of (a) funneled and (b) rugged energy 
landscapes. In (a), the depth of the energy minimum that 
drives folding AE S> 5E, where 5E gives the root-mean- 
square energy fluctuations over the given range of the reaction 
coordinate. In (b), Ai5 ~ 5E. 

a model protein that reliably folds to its native state on 
a rugged energy landscape with 10^ — 10'* distinct basins 
sampled during folding. (A basin is a region of configu- 
ration space, or collection of conformations, that relaxes 
to a single local energy minimum when thermal fluctua- 
tions are suppressed Instead of discrete pathways 
through the energy landscape, we flnd a statistical ensem- 
ble of pathways with large fluctuations in folding times. 
The folding time and number of distinct basins sampled 
during folding scale as a power-law, which suggests that 
reliable folding on rugged landscapes can be described as 
conformational diffusion on a fractal network of basins. 

Heteropolymer model: To study proteins with 
rugged energy landscapes, simulation models should pos- 
sess three key features: (1) unique native state, (2) many 
metastable, local energy minima, and (3) large energy 
barriers that separate local minima so that 5E ^ AE. 
Further, we must be able to search configuration space 
in a reasonable amount of computer time, which excludes 
all-atom simulations. In these studies, we will focus on a 
model heteropolymer that exhibits features (l)-(3). 

We model proteins as heteropolymers composed of 
equal-sized spherical monomers with hydrophobic and 
hydrophilic interactions fl3|. The model includes hy- 
drophilic monomers (white) and two types of hydropho- 
bic monomers (red and green) as shown in Fig. (5] Green 
and red monomers interact via an attractive Lennard- 
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FIG. 2: (Color online). The heteropolymer model in its 
(a) extended, (b) metastable misfolded, and (c) native states, 
(d) Schematic of a first-passage network (black dashed lines) 
from basin 'S' to 'F', superimposed on the complete network 
composed of all basins and transitions between them (gray 
lines) . 

Jones potential with minimum energy — -Eatt, except the 
green monomers on both ends of the chain that inter- 
act with minimum energy — 2£'att- AH other monomer- 
monomer interactions are purely repulsive [l3| . We 
also include a FENE potential [3| between adjacent 
monomers to maintain the polymer constraint. We simu- 
late the 18— mer sequence ggggwwwrrrrwwwgggg, where 
g,w and r represent green, white and red monomers, re- 
spectively. This model displays a complex energy land- 
scape with ~ 10^ distinct local energy minima. For sim- 
plicity, local minima are defined by the list of contacting 
green and red monomers [l5| . The native conformation 
of this heteropolymer is given by the particular set of 14 
green-red contacts shown in Fig. [2] (c). 

Thermal fluctuations of the heteropolymer are studied 
using Brownian dynamics, where the temperature T is re- 
ported in units of the attractive energy, e.g. T — l/i cor- 
responds to thermal energy £^att/3. To compare results 
for rugged and funneled energy landscapes, we also simu- 
lated the same heteropolymer with Go-interactions [l^ . 
where attractive interactions are only included between 
monomers that form contacts in the native state. The 
simplest measures of kinetics are the folding and unfold- 
ing times shown in Fig. [3l The folding time r/ is calcu- 
lated by preparing the heteropolymer in an ensemble of 
extended states and measuring the average folding time 
to the native state. t„ is the average unfolding time from 
the native state to any extended state with zero red-green 
contacts. For temperature T < T* = 0.8, Tf < and 
the extended conformation is significantly less stable than 
the native state. The increase in t/ as T decreases, as 
shown in Fig. [HI has been observed in experimental stud- 
ies of proteins l^ and is a general feature of materials 
quenched below the glass transition [3| when energy bar- 
riers become large compared to T. An important feature 
of the heteropolymer model is that folding only occurs 
for temperatures where drf /dT < 0. In contrast, folding 
simulations of the Go-model yield drf /dT > for all T, 
as shown in the inset to Fig. [3] 

First-passage networks: For each heteropolymer 
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FIG. 3: Ensemble-averaged folding Tf (squares) and unfold- 
ing Tu (circles) times vs temperature for the heteropolymer 
(main figure) and Go (inset) models. The vertical line at 
T* = 0.8 indicates the folding temperature. 

conformation, we can determine the list of contacting 
green and red monomers and uniquely associate this list 
of contacts with a basin that surrounds the associated lo- 
cal energy minimum. For rugged landscapes, the system 
will sample a large number of basins as folding proceeds 
from the extended to the native state. The trajectory of 
the model protein as it folds can be viewed as a network of 
connected nodes in configuration space. The nodes repre- 
sent the basin of a local energy minimum sampled by the 
system, and bonds that join two nodes represent tran- 
sitions from one basin to another. These networks are 
termed "first-passage networks" since they are formed as 
the protein makes its first passage from an initial to the 
native conformation. Note that each first-passage net- 
work is a subset of all basins and transitions between 
them, as illustrated in Fig.[2](d). 

We compiled ^ 10^ first-passage networks originating 
from the non-native conformation in Fig. [5] (b) and end- 
ing at the native state over a range of T < 0.8. We map 
the conformation of the heteropolymer to its associated 
basin every q time steps to construct first-passage net- 
works. We assume that the features of the first-passage 
networks depend on T but are independent of the initial 
state since the first-passage networks are composed of a 
large number of nodes. 

The simplest properties of first-passage networks are 
the number of distinct basins sampled (nodes) Ni and 
bonds iVfc. Nodes and bonds are only counted once, even 
if multiple transitions are made between a given set of 
basins. We also measure the total number of transitions 
Nt oc Tf > Nb. Fig. [4] shows raw data for the number of 
bonds Nb and transitions Nt plotted versus the number 
of nodes Ni using q = 1000. There are 850 data points 
for each temperature, each taken from a distinct first- 
passage network. For all T the number of sampled basins, 
Ni, fluctuates between 10^ and 10^, which indicates that 
the model protein adopts a large number of conforma- 
tions before arriving at the native state. The wide range 
of Ni indicates that there is not a single folding pathway, 
but rather a statistical ensemble of pathways. 
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FIG. 4: (Color online). Number of (a) bonds Nb and (b) tran- 
sitions Nt in first-passage networks vs the number of nodes Ni 
over a range of temperature. For each T, Nb and Nt have been 
multiplied by constant factors (shifted vertically) for clarity. 

In Fig. 21 Nt,, Nt and Ni show strong fluctuations from 
one realization to the next; however, the fluctuations 
obey power-law scaling: 



A^f, oc iV,^ and Nt oc N^ 



(1) 



This correlation is non-trivial and depends on global 
properties of first-passage networks. We find that distri- 
butions of local features of the network, such as single- 
jump activation times and distances, and the number of 
bonds per node, are exponential. Thus, local properties 
of first-passage networks cannot be responsible for the 
power-law scaling. 

In Fig. [5l we plot the scaling exponents F and A at 
different temperatures T. While A reaches a plateau at 
K, 1.4 at small T, F continues to increase with decreas- 
ing T . The increase of F is a signature of temperature- 
dependent exploration of configuration space in systems 
with rugged landscapes. A system with a rugged energy 
landscape at energy E only samples a small temperature- 
dependent fraction of conformations at that energy due 
to large activation barriers. In contrast, F « 1.5 at all T 
for the same heteropolymer model with Go-interactions. 
In systems with funneled energy landscapes {i.e. the Go 
model), a protein with energy E samples conformations 
with that energy more uniformly. 

The data shown in Fig. 2] are obtained by identifying 
basins every q = 1000 time steps. We have also per- 
formed simulations in the range 1 < g < 10*^ and observe 
that the exponents F and A are independent of q. These 




FIG. 5: (Color online). The scaling exponents F and A and 
the prediction l/tidf for F from Eq. [4] Error bars for F and 
A are smaller than the symbol size. 

results further indicate that first-passage networks are 
self-similar and fractal. 

Origin of power laws: If we assume that first- 
passage networks are fractal, we can predict the expo- 
nent F from the fractal scaling exponents of the network. 
This assumption will be verified a posteriori. 

On any network we can define the chemical distance 
Ac given by the shortest path between two nodes of the 
network. This distance is useful because it depends only 
on network connectivity and is independent of the em- 
bedding space [l^ . For a fractal network, we expect 



Ac oc t", 
iV(Ac) oc Ac^f 



(2) 
(3) 



where A^(Ac) is the number of distinct basins sampled 
within chemical distance Ac and time interval t, df is 
the chemical fractal dimension, and the exponent k char- 
acterizes the scaling of chemical distance with time. 

Given these relations, the correlation between Ni and 
Nt can be explained as follows. A single first-passage net- 
work is formed over folding time Tf (x Nt, during which 
the system explores average chemical distance Ac (x N^'^ 
(Eq. [5]). Moreover, for a given chemical distance Ac, the 
number of sampled basins on the first passage network 
scales as N, oc iV(Ac) oc Ac* (Eq. [3]). Thus, both Ni 



and Nt are related to Ac, and we find Nt oc N 



1 1 Kdf 



F = 



1 



(4) 



The prediction for F relies on the first-passage networks 
being fractal. In Fig. [6] (a), we test Eq. [2] and observe 
that Ac grows as a power law at large t for all temper- 
atures studied. We average Ac over 1500 first-passage 
networks and only include t < Tf for each realization. 
The exponent k decreases with T, which implies that 
colder systems explore chemical distance more slowly. 

In Fig. [6] (b) we test Eq. [3]and find that, over the lim- 
ited range of chemical distance accessible to our small 
heteropolymer, the chemical fractal dimension df is well- 
defined and depends linearly on temperature. A'^(Ac) is 
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FIG. 6; (Color online), (a) The mean chemical distance 
Ac sampled in the time interval t by the heteropolymer and 
(b) the mean number of basins N{Ac) within Ac at different 
temperatures. In (a) and (b), the symbols are the same as in 
Fig. |4l and the insets display the scaling exponents used to 
fit the data (dotted lines) for different temperatures. 

computed by including all sampled basins in 850 different 
first-passage networks at each T. While power-law scal- 
ing of iV(Ac) only holds for Ac < 8, the average chemical 
distance explored on a first-passage network is always 
smaller than 8. Therefore, the prediction for F based 
on power-law scaling should hold during the folding pro- 
cess. In Fig. [5l we find excellent agreement between the 
folding-time exponent F and our prediction l/redf. 

We have studied first-passage networks formed by the 
folding trajectories of a heteropolymer and observed 
power-law scaling between the folding time (oc Nt) and 
number of nodes Ni and bonds Ni, in first-passage net- 
works. We have also demonstrated that the folding-time 
exponent F can be obtained by measuring the fractal ex- 
ponents that characterize the structure of first-passage 
networks in configuration space. 

Our results do not describe properties of the complete 
network of basins in the energy landscape. However, as 
far as folding is concerned, our results suggest that this 
network is not relevant. Just as normal diffusion will 
trace out a two-dimensional fractal network of sampled 
nodes, no matter how large the dimension of the under- 
lying space is, proteins with rugged energy landscapes 
also trace out fractal networks that are independent of 
the complete network. This behavior is not peculiar to 
proteins with rugged energy landscapes, but is also ex- 
pected in glass- forming materials at low temperature [2l| . 
Moreover, df decreases with temperature, and is always 



much smaller than the dimension of configuration space 
D, which imphes that N, - {AcYf < (Ac)^. This pro- 
vides a mechanism by which systems with rugged energy 
landscapes can fold reliably without kinetic pathways and 
offers a novel resolution to Lcvinthal's paradox [22] . 
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